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ABSTRACT 


Low image contrast limits the amount of information conveyed to the user. 
With the proliferation of digital imagery and computer interface between man- 
and-machine, it is now viable to consider digitally enhancing the image before 
presenting it to the user, thus increasing the information throughput. This thesis 
explores the effect of the Contrast Limited Adaptive Histogram Equalization 
(CLAHE) process on night vision and thermal images. With better contrast, target 
detection and discrimination can be improved. The contrast enhancement by 
CLAHE is visually significant and details are easier to detect with the higher 
image contrast. Analyzing the image frequency response reveals increases in the 
higher spatial frequencies. As higher frequencies correspond to image edges, the 
power increase is viewed as corresponding to edge enhancement and hence, an 
increase in visible image details. This edge enhancement is perceived as 
improvement in image quality. This is further substantiated by a subjective 
testing, where a majority of human subjects agreed that CLAHE-enhanced 
images are more informative than the original night vision images. 
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I. INTRODUCTION 


A. BACKGROUND 

The element of surprise has long been touted as the main tactical 
advantage that would turn the tide of a battle. Throughout history, commanders 
have employed the darkness of night to gain surprise and to grasp the initiative 
from the hands of the enemy. Yet, while night operations have progressed from 
nocturnal maneuvers to the more recent firefights in Afghanistan and the “24- 
hour battlefield”, difficulties associated with night operations still plague all 
commanders, particularly the ability to see clearly and the ability to differentiate 
friend-or-foe. The fact remains that darkness is "a double-edged weapon", and 
like terrain, "it favors the one who best uses it and hinders the one who does 
not." [Sasso, 1982], 

Human beings are visual and non-nocturnal creatures by nature. Not 
gifted with any special or hyper-sensitive sensory organs, they rely more on their 
ability to see than on any of the other four senses (smell, hear, touch and taste) 
to understand and manipulate their surroundings. The cone and rod 
photoreceptors in the human eye are responsible for generating these sought-for 
visionary senses. The rods are more numerous and more sensitive than cones in 
low levels of illumination (more than one thousand times). They basically 
contribute our limited night or scotopic vision. However, the rods are not sensitive 
to color like the cones, i.e. they only generate monochrome images. Hence, 
objects that appear brightly colored in daylight, when seen under moonlight 
appear as colorless forms, because only the rods are stimulated. 

In the absence of artificial light sources, the main source of natural 
illumination at night comes from the moon and to a lesser degree, the stars 
(estimated at one-tenth of a quarter moon). The amount of luminance ranges 
from 0.1 lux (full moon) to 0.0001 lux (overcast night) [Sampson, 1996], 
Depending on the reflectivity of the objects, the eventual irradiance on the human 
eye may not be high enough to even stimulate the rods. 
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However, if we explore beyond the visible light spectrum (400 -700 nm), 
the Infra-Red (IR) spectrum offers possibilities for exploitation as reflected in 
Figures 1 and 2. Both the night luminance and the foliage reflectivity are higher in 
the Near Infra-Red (NIR) band, i.e. there is more light energy in this wavelength 
band. 



DO <00 500 600 700 KX) 900 1000 1100 1200 1300 t«00 1500 

WAVELENGTH (Nanometers) 


Figure 1: Natural night sky spectral irradiance, showing a higher 

irradiance in the NIR band [From Korry, 2003], 



WAVELENGTH (Nanometers} 

Figure 2: Foliage reflectivity: foliage is a better reflector in the IR 

band [From Korry, 2003], 


2 
















Hence, if we were able to “sense” IR or near-IR radiation (which the 
human photoreceptors are unable to do naturally), our night vision capability 
would be immediately improved, given the higher luminance available. 


B. NIGHT VISION 

There are two basic methods to improve night vision. The first is to 
increase the amount of visible light reaching the eye, as with artificial lighting 
such as a flashlight or by converting the “otherwise-invisible” radiation to visible 
radiation. The second is through light amplification, i.e. by increasing the 
normally imperceptible radiation energy to a level detectable by the human eyes. 
These methods to achieve night imagery are employed by the Image Intensifier 
(II) and the Thermal Imager (Tl). 

1. Image Intensifier 

As the name implies, Image Intensifiers (II) are designed to boost very low 
intensity optical images to the point where they become perceivable to the 
human eye. They also act as wavelength “down-converters”, that is they convert 
near-IR radiation into visible radiation. II devices are commonly known as Night 
Vision Device (NVD) or Night Vision Goggles (NVG), depending on the mode of 
usage. 



Figure 3: A Night Vision Device with the light amplifying 

microchannel plate [From Korry, 2003], 
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A typical II system consists of three main components: the photocathode, 
the micro-channel plate (MCP) and the phosphor screen, as shown in Figure 3. 
Reflected light from the scene or object enters the device and is focused onto the 
photocathode by an optical lens system. Photons striking the photocathode 
surface release photo-electrons. The flux of photo-electrons generated is 
proportional to the flux of incident light photons and the responsivity of the 
photocathode. In the first-generation of NVDs, the energy of the photo-electron is 
increased by acceleration with an externally applied electric field. Second- 
generation devices make use of the MCP to achieve energy gain through 
electron multiplication. The actual number of photo-electrons is multiplied by 
accelerating the electrons through the MCP where an “avalanche” of secondary 
electrons is produced as a result of collisions between the electrons and the 
MCP wall. On emerging from the MCP, the electrons strike a phosphor screen 
which emits visible light, hence creating a visible image to the human eye. The 
most commonly-used phosphor is KA(P20) as it emits a greenish light at 560 nm, 
matching the peak sensitivity of the human eye. Furthermore, the P20 has fast 
decay time and high conversion efficiency, which is ideal for night vision purpose 
[Ji, 2002], 

The newer generation (Gen III) of NVDs uses a Gallium Arsenide (GaAs) 
photocathode which is sensitive to light beyond 800 nm and where the night sky 
illuminance levels are also higher (Figure 1). The MCP used in the third- 
generation NVDs is also much smaller in pitch, thus giving better spatial 
resolution. As a result, Gen III NVDs can deliver a three-fold improvement in 
visual acuity and detection distances over the earlier generations. The light 
amplification achievable could be 30,000 times or more [LCEO, 2003], 
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2. Thermal Imager 

All material objects with temperatures above absolute zero Kelvins radiate 
infrared energy. A Thermal Imager (Tl) detects this radiation (including reflected 
infrared energy) and converts this energy into a visible presentation. The 
commonest class of Tl systems is the Forward-Looking Infrared system (FUR). A 
system operating in the 8- to 14-pm region is usually referred to as an LWIR 
(long-wavelength infrared) FUR, and one operating in the 3- to 5-pm as a MWIR 
(medium-wavelength infrared) FUR. These are the two transmission windows 
where atmospheric attenuation of infrared radiation is minimal. 

Most IR detectors operate using quantum mechanical interaction between 
incident photons and detector material. Photoconductive detectors absorb 
photons to elevate electrons from the valence band to the conduction band of the 
material, changing the conductivity of the detector. Photovoltaic detectors absorb 
photons to create electron hole pairs across a p-n junction which produces a 
small current. Such devices can be manufactured as part of an array that 
includes a capacitor that stores a charge proportional to the incident radiation. 
The charged array can then be read or scanned to produce the corresponding 
image. 

As the Tl senses temperature difference or contrast (sensitivity is 
frequently defined in terms of Minimum Resolvable Temperature Difference), 
detectors with small band-gap energies must be cooled to minimize thermally 
generated carriers and inherent detector noise. 

The bolometer is a thermal detector that absorbs thermal energy over all 
wavelengths and changes its resistance accordingly. The change in resistance 
will produce a change in electric current which can be monitored. The radiation to 
the bolometer is usually modulated to improve sensitivity and uniformity [Holst, 
2003], 
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C. CONTRAST SENSITIVITY 

The difference in radiation intensity levels (both emitted and reflected) 
from a scene creates the information contained within an image. An object of 
interest can be identified by its contrast against its immediate surroundings, 
which defines the object’s boundaries and edges. Contrast is defined as the 
difference in luminance or radiation intensity levels between regions or pixels. 

The larger the contrast, the easier an object can be detected from the 
scene. This can be illustrated by Figure 4, a Contrast Sensitivity Function (CSF) 
test image produced by Campbell and Robson in 1968. 



Figure 4: Contrast Sensitivity Function test chart by Campbell- 

Robson [From McCourt, 2003], 

In Figure 4 above, spatial frequency increases from left to right (the bars 
become thinner and thinner) and contrast decreases from bottom to top 
(difference in gray level between the bars and background decreases). From a 
fixed viewing distance, note the contrast values where the bars are just barely 
visible over the range of spatial frequencies. Trace these out to form an inverted 
U-shaped curve and this will represent your contrast sensitivity function. The 
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region below the U-shaped curve is the visible stimuli region, where objects of 
such combination of spatial frequency and contrast will be detectable by the eye. 
The CSF of a typical adult human is shown in Figure 5 for reference. The 
influence of contrast on visible stimulus and object detection is evident. 



Spatial Frequency (cycles/degree) 

Figure 5: CSF of adult human. Contrast sensitivity is defined as the 

inverse of contrast threshold, which is the minimum contrast level to 
see the grating in the test image [From McCourt, 2001], 


1. II Imagery 

Figure 6 is a typical II image obtained by a NVD or NVG. As discussed in 
the previous section, the low luminance, coupled with low reflectivity from the 
ground and foliage, generates a low-contrast image with limited dynamic contrast 
range. Detector noise and clutter from the background degrades the image 
further. Figure 6 also shows a lack of details and contrast in the ground before 
the treeline, which are essential for situational awareness and navigation. 
However, the upper portion of the image has better contrast due to illumination 
by the night sky (from moon and stars). In this more illuminated region, the 
foliage can be differentiated, as the objects would be within the CSF for 
detection. 
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Figure 6: A NVD image [From Naval Research Laboratory (NRL)] 



r 

«¥ 


Figure 7: A Tl image [From NRL], 
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2. Tl Imagery 

Figure 7 is a FUR or Tl image of the same scene as Figure 6. The 
temperature difference between the regions (due to different cooling rates of the 
earth or soil) generates sufficient contrast to see the layout of the ground before 
the treeline. The warm air and the low emissivity of the trees also creates the 
sharp contrast cues of the treeline against the sky (the treeline appears darker). 
However, for areas of homogeneity in temperature or emissivity (such as the 
foliage of individual trees), there is a lack of contrast or surface information, as 
evident by the “hollow appearance” of the foliage. Note that the I Is do not have 
this problem, as they detect the reflected radiation from the surface of the 
objects. Hence, the information contained in the II and Tl images is 
complementary since the sensors operate in different bands of the 
electromagnetic spectrum. This leads to the impetus for image or sensor fusion 
to improve image quality and content [Scrofani, 1997], 

3. Comparison of Tl and II Imagery 

In a military context, the object of interest tends to be either man-made or 
alive. Such objects will have a temperature above zero Kelvin, due to body heat 
or some other energy generating process. Without solar heating, the air and the 
earth cool down during the night. Hence, all these objects of interest will contrast 
easily against the background and stand out in a Tl, unless there is deliberate 
action to reduce the temperature contrast (such as camouflaging or shielding). In 
comparison, II depends greatly on ambient light (artificial or natural) for visibility, 
as it amplifies reflected incoming light. Therefore, in a totally dark room, the II will 
not be able to generate any image at all, whilst the Tl is still able to “see”, 
provided that there are temperature gradients present. The Tl also has better 
ability to see through smoke, rain and snow, as the longer wavelength IR 
radiation is able to propagate in the presence of such atmospheric particles with 
minimal attenuation, unlike shorter visible and near-IR radiation which would be 
scattered. As a result, the detection range for Tl tends to be greater than II. 
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As II intensifies and amplifies incoming light, there is a possibility of 
“overloading” the II detector by a bright or high luminance source, which could 
temporarily “black-out” the sensor, similar to human vision when stepping out 
from a dark room into bright sunlight. The II is designed to “see” at night where 
the luminance level is low (0.1 lux or lower). Hence, a source with an intensity 
level a couple of orders of magnitude higher is sufficient to overload the II 
baseline sensitivity (a handheld flashlight is capable of producing 100 lux or 
more). Although the MCP amplifier generally has a non-linear response which 
reduces gain response at high irradiance, it is still insufficient to isolate bright 
sources and avoid such saturation. Figure 8 is a representation of this “over¬ 
exposure” pitfall of the II by a light source. 



Figure 8: An II image degraded by over-exposure [From NRL], 
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Figure 9: A Tl image of the same scene as Figure 8, displaying 

better contrast and details level than the II image [From NRL], 

Given the tactical advantages of Tl and the shortcomings of II, there is 
therefore a general preference for Tl as the night vision sensor of choice for 
detection. However, II still has a slight advantage in identification, because of its 
ability to sense surface differences from their reflectivity. The relatively lower cost 
and compactness of II systems make them attractive for field deployment, as 
unlike the Tl systems, they do not require a cooling system for better sensitivity. 

In general, due to limited reflectivity characteristics from the scene, the 
quality of II images is hampered by lower contrast. It is difficult to discriminate 
objects from the background and clutter. From the previous section, increasing 
the contrast increases the visible stimulus and the probability of detection, as 
demonstrated by the Contrast Sensitivity Function (CSF) in Figure 5. Therefore, 
the usability of II system for detection will be enhanced if the contrast of the II 
images can be improved or the dynamic range expanded, without altering the 
spatial content of the original image. 


11 





D. 


OBJECTIVE 


Image enhancement techniques to improve visual quality have been 
popularized with the proliferation of digital imagery and computers. Techniques 
range from noise filtering, edge enhancement, color balance and contrast 
enhancement, in both frequency and spatial domains. Even in word processor 
software such as Microsoft Word, there are features or tool options to manipulate 
contrast and brightness levels of images. 

Computer-aided operation is also becoming a necessity, even in the 
military. Advanced systems and arms modernization programs often involve the 
integration of a computer or a computer processing interface to reduce the 
combat loading on the soldier or improve system reaction time. One prime 
example is the Land Warrior program [FAS website, 2003], where 
communications, sensors, and materials are integrated into a complete soldier 
system. At the heart of this soldier system, is a computer module or subsystem 
which integrates all the information and sensors together before presenting to the 
soldier via a helmet mounted display. The electro-optical sensors include thermal 
weapon sight, image intensifier, video camera (visible) and laser range-finder. 
Electro-optical sensors are also generally transitioning from direct view to remote 
display, which provides a possibility for enhancement. 

Taking the two developments in stride, it is therefore feasible to digitally 
enhance the night vision images with a computer algorithm before presenting it to 
the user, particularly a military one. Images acquired from the night vision device 
can be easily digitized by coupling the sensor output screen to a scanning array 
or an Analog-to-Digital converter. Next, the digital image can undergo a contrast 
enhancement algorithm, such as the Contrast-Limited Adaptive Histogram 
Equalization (CLAHE) to improve its visible scene content, while maintaining the 
spatial relation of the original image, before displaying the final improved image 
to the human user. 

II systems and images would benefit most from such a contrast 
enhancement because of their inherent low contrast limitation. The II system 
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would be given a new life and a new “light” per se, when the quality of the II 
images can be improved significantly by the proposed algorithm. Furthermore, no 
major modification is required on the II system since the enhancement is done by 
a software algorithm. 

This thesis explores the effect of such an image enhancement algorithm 
on the night vision image. Chapter II briefly reviews the fundamentals of digital 
image processing and the CLAHE process, while Chapter III analyses the 
enhancement results obtained with the CLAHE process. Finally, Chapter IV 
presents the conclusions and recommendations for further research. 


13 



THIS PAGE INTENTIONALLY LEFT BLANK 


14 



II. DIGITAL IMAGE PROCESSING 


A. DIGITAL IMAGE 

A digital image is essentially a two-dimensional array of light-intensity 
levels, which can be denoted by f(x,y), where the value or amplitude of f at 
spatial coordinates (x,y) gives the intensity of the image at the point. The 
intensity is a measure of the relative “brightness” of each point. The brightness 
level is represented by a series of discrete intensity shades from darkest to 
brightest, for a monochrome (single color) digital image. These discrete intensity 
shades are usually referred to as the “gray levels”, with black representing the 
darkest level and white, the brightest level. These levels will be encoded in terms 
of binary bits in the digital domain, and the most commonly used encoding 
scheme is the 8-bit display with 256 levels of brightness or intensity, starting from 
level 0 (black) to 255 (white). The digital image can therefore be conveniently 
represented and manipulated as an N (number of rows) x M (number of columns) 
matrix, with each element containing a value between 0 and 255 (for an 8-bit 
monochrome image), i.e. 


f(x,y) 


f(0,0) 

f(1,0) . 

. f(0,M-1) 

f(1,0) 

f(1,1) . 

. f(1,M-1) 

f(N-1,0) 

f(N -1,1) . 

. f(N-1,M-1) 


, where 0 < f(x,y) < 255. 


Different colors are created by mixing different proportions of the 3 primary 
colors: red, green and blue, i.e. RGB for short. Hence, a color image is 
represented by an N x M x 3 three-dimensional matrix, with each layer 
representing the gray-level distribution of one primary color in the image. 

Each point in the image denoted by the (x,y) coordinates is referred to as 
a pixel. The pixel is the smallest cell of information in the image. It contains a 
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value of the intensity level corresponding to the detected irradiance. Therefore, 
the pixel size defines the resolution and acuity of the image seen. Each individual 
detector in the sensor array and each dot on the LCD (liquid crystal display) 
screen contributes to generate one pixel of the image. There is actually a 
physical separation distance between pixels due to finite manufacturing 
tolerance. However, these separations are not detectable, as the human eye is 
unable to resolve such small details at normal viewing distance (refer to 
Rayleigh’s criterion for resolution of diffraction-limited images [Pedrotti, 1993]). 
For simplicity, digital images are represented by an array of square pixels. 

The relation between pixels constitutes the information contained in an 
image. A pixel at coordinates (x,y) has eight immediate neighbors which are a 
unit distance away: 


(x-1, y-1) 

(x-1, y) 

(x-1, y+1) 

(x, y-V 

(x,y) 

(X, y+i) 

(x+1, y-1) 

(x+i, y), 

(x+1, y+1) 


Figure 10: Neighbors of a Pixel. Note the direction of the x and y 
coordinates used. 


Pixels can be connected to form boundaries of objects or components of 
regions in an image when the gray levels of adjacent pixels satisfy a specified 
criterion of similarity (equal or within a small difference). The difference in the 
gray levels of two adjacent pixels gives the contrast needed to differentiate 
between regions or objects. This difference has to be of a certain magnitude in 
order for the human eye to identify it as a boundary. 
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B. IMAGE PROCESSING METHODS 


There are two main methods to process an image as defined by the 
domain in which the image is processed, namely the spatial domain or the 
frequency domain. The spatial domain refers to the image plane itself, and 
approaches in this category are based on direct manipulation of pixels in an 
image. Frequency domain processing techniques are based on modifying the 
spatial frequency spectrum of the image as obtained by the Fourier transform. 
Enhancement techniques based on various combinations of methods from these 
two categories are not unusual and the same enhancement technique can also 
be implemented in both domains, yielding identical results [Gonzalez and Woods, 
1993], 


1. Spatial Domain Methods 

The spatial domain refers to the aggregate of pixels composing an image, 
and spatial domain methods are procedures that operate directly on these pixels. 
Image processing functions in the spatial domain may be expressed as: 

9 (x,y) = T[ f(x,y) ], (1) 

where f(x,y) is the input image data, g(x,y) is the processed image data, 
and 7 is an operator on f, defined over some neighborhood of (x,y). In addition, T 
can also operate on a set of input images, for example performing the pixel-by¬ 
pixel sum and averaging a number of images for noise reduction. 

The principal approach to defining a neighborhood about (x,y) is to use a 
square or rectangular mask centered at (x,y). The center of this mask or window 
is moved from pixel to pixel, and the operator applied at each location (x,y) to 
yield the corresponding g for that location. The resultant g(x,y) is stored 
separately, instead of changing pixel values in place, to avoid a “snow-balling” 
effect of the altered gray levels. 
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2. Frequency Domain Methods 

The foundation of frequency domain techniques is the convolution 
theorem. The processed image, g(x,y), is formed by the convolution of an image 
f(x,y) and a linear, position-invariant operation h(x,y), that is 

g(x,y) = h(x,y)*f(x,y). (2) 

By the convolution theorem, the following frequency domain relation holds: 

G(u,v) = H(u,v) F(u,v), (3) 

where G, H, and F are the Fourier transforms of g, h and f respectively. 
H(u,v) is called the transfer function of the process. In a typical image 
enhancement application, f(x,y) is given and the goal, after computing F(u,v), is 
to select a H(u,v) so that the desired image g(x,y) exhibits some highlighted 
feature of f(x,y), i.e. 

g(x,y) = F 1 [ H(u,v) F(u,v) ]. (4) 

For instance, edges in f(x,y) can be accentuated by using a function H(u,v) 
that emphasizes the high-frequency components of F(u,v). 

3. Global and Local Methods 

Image processing methods that involve using a single transformation 
function for the whole image are classified as global methods or algorithms. The 
lowpass/highpass filters and histogram transformation are examples of global 
enhancement methods. The main advantage of global methods is that they are 
computationally inexpensive and simple to implement. However, global methods 
may attenuate or miss local information while working on the overall 
characteristic of the image. 

The transformation function of a local processing method is dependent on 
the location and the neighborhood of the pixel looked at, i.e. 

g(x,y) = T[x,y, f(x,y)]. (5) 
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These methods are therefore “adaptive” to the local information within the 
image. Adaptive histogram equalization is an example of such a local processing 
method and is effective in enhancing details in local areas of the image. 
However, pixels of the same gray level in the original image may be mapped to 
different gray levels in the output image, due to the various “localized” mapping 
or transformation functions, which could artificially alter the appearance of the 
original image. Abrupt changes or boundaries may also result in the image, 
because each transformation is done locally and independently. 

C. FILTERS 

Filtering refers to the selective processing of an image to remove 
unwanted aspects of the image or to transform only certain portions of the image. 
Lowpass filters attenuate or eliminate high-frequency components in the Fourier 
domain, while allowing low frequencies to pass through untouched. As the high 
frequency components characterize edges and other sharp details in an image, 
the net effect of lowpass filtering is image blurring [Gonzalez and Woods, 1993], 
Hence, lowpass filters are also known as smoothing filters and are commonly 
used for noise reduction. 

Similarly, highpass filters attenuate low-frequency components. Because 
these components are responsible for the slowly varying characteristics of an 
image, such as overall contrast and average intensity, the net result of highpass 
filtering is a reduction of these features and a corresponding apparent 
sharpening of edges and other sharp details. Highpass filters are therefore 
known also as sharpening filters. 

1. Lowpass Filtering 

As indicated earlier, edges and other sharp transitions (such as noise) in 
the gray levels of an image contribute significantly to the high-frequency content 
of its Fourier transform. Hence, blurring or smoothing is achieved in the 
frequency domain by attenuating a specified range of high-frequency 

components in the transform of a given image. 

19 



A 2-D ideal lowpass filter is one whose transfer function in equation (4) 
satisfies the relation: 


H(u,v) 


jl if D(u,v) < D 0 
{o if D(u, v) > D 0 


( 6 ) 


where D 0 is a specified non-negative quantity and D(u,v) is the distance 
from point (u,v) to the origin of the frequency plane, i.e. 

D(u,v) = (u 2 + v 2 ) 1 ' 2 . (7) 

The point of transition between H(u,v) = 1 and H(u,v) = 0, Do, is called the 
cutoff frequency. One way to establish this cutoff frequency is to define the 
percent of signal power to be contained within or passed by the filter. Do is then 
equivalent to the radius of a circle with origin at the center of a 2-dimensional 
frequency plot. For an ideal filter, this transition is an impulse step, i.e. 
frequencies equal to or less than D 0 are passed with no attenuation, while 
frequencies higher than Do are completely attenuated. However, this sharp cutoff 
frequency cannot be realized with electronic components. 

The Butterworth lowpass filter was formulated to address this practical 
limitation, as it does not have a sharp discontinuity between passed and filtered 
frequencies. The Butterworth transfer function (of order n) is defined as follows 
[Gonzalez and Woods, 1993]: 


H(u,v) 


1 

1 + [D(u,v)/D 0 ] 2n ' 


( 8 ) 


Lowpass smoothing fliters can also be implemented in the spatial domain. 
Figure 11 shows a general 3x3 linear mask with arbitrary coefficients (weights) z. 
Denoting the gray levels of pixels under the mask at any location by zi, Z 2 ... zg, 
the response of the mask is: 

R = wiZi + W2Z2 + ... + W9Z9. (9) 
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Figure 11: A 3x3 spatial mask with arbitrary coefficients [From 

Gonzalez and Woods, 1993], 

All the coefficients of the mask are set to a value of 1 for simple smoothing 
processing. The response from the mask would be the sum of gray levels for the 
nine pixels under the mask, as per equation (8). This response R is then scaled 
down by dividing by the total number of pixels (nine in this case) to keep within 
the original gray levels range. Therefore, the response or result would simply be 
the average of all the pixels in the area of the mask. Larger masks (e.g. 5x5 and 
7x7) follow the same concept and will blur the image further with larger 
neighborhood averaging. For the border pixels of the image, there will be a 
shortage of neighborhood pixels for the mask. One option is to pad the shortage 
with pixels of the same values as the center pixel or a reference pixel. Another 
option is to process one layer less of pixels, i.e. no filtering on the border pixels. 

Lowpass filters are generally used for blurring and for noise reduction in 
preprocessing steps, such as the removal of small details from an image prior to 
object extraction, and bridging of small gaps in lines or curves. Figure 12 
illustrates the effect of a lowpass filter. 
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Figure 12: Lowpass filtering with a 3x3 spatial filter or 98% percent 
power Do locus. The top image is the original image and the bottom the 
processed image, where the image details have been blurred. 
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2. Highpass Filtering 

Image sharpening can be achieved in the frequency domain by a highpass 
filtering process as edges and other abrupt changes in gray levels are associated 
with high-frequency components. Such filtering attenuates the low-frequency 
components without disturbing high-frequency information in the Fourier 
transform. Highpass fliters are therefore known also as sharpening fliters. 

The highpass filtering process can be implemented in both the frequency 
and spatial domains. For highpass filtering in the frequency domain, the transfer 
function is essentially the inverse of that obtained for lowpass filtering, 


H(u,v) 


fO ifD(u.v) < D 0 
{1 if D(u,v) > D 0 


( 10 ) 


The transfer function of the Butterworth highpass filter of order n and with 
cutoff frequency locus at distance Do from the origin is defined by the relation 


H(u,v) 


1 

1 + [D 0 /D(u,v)] 2n ' 


( 11 ) 


The principal objective of sharpening is to highlight fine detail in an image 
or to enhance detail that has been blurred, either in error or as a natural effect of 
a particular method of image acquisition. Uses of image sharpening vary and 
include applications ranging from electronic printing to medical imaging to 
industrial inspection and autonomous object detection. 

A basic 3x3 highpass spatial mask is shown in Figure 13. The center 
coefficient is positive while the rest of the mask contains negative coefficients. 
The sum of the coefficients is then equal to zero. Thus, the output of the mask is 
zero or very small when the mask is over an area of constant or slowly varying 
gray level. As with highpass frequency filtering, the zero-frequency term is 
attenuated or eliminated. This will reduce the average gray-level value in the 
image to zero, which in turn reduces the global contrast of the image. The 
expected result from such a highpass mask is therefore characterized by 
highlighted edges over a dark background. Reducing the average value of an 
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image to zero also implies that the image may have negative gray levels due to 
the negative coefficients in the mask. Next, the results have to be adjusted or 
clipped and scaled down (by dividing by the number of pixels in the mask) to 
keep the output within the original (non-negative) gray level range. 


-1 

-1 

-1 

-1 

8 

-1 

-1 

-1 

-1 


Figure 13: A basic highpass spatial filter [From Gonzalez and 

Woods, 1993], 

A highpass filtered image can be computed as the difference between the 
original image and a lowpass filtered version of the same image, as the highpass 
filter is the complement of the lowpass, i.e., 

Highpass = Original - Lowpass. (12) 

Multiplying the original image by an amplification factor, denoted by A, 
yields the definition of a high-boost or high-frequency-emphasis filter, i.e., 

Highboost = (A)(Original) - Lowpass, 

= (A-1)(Original) + Original - Lowpass, 

= (A-1)( Original) + Highpass. (13) 

When A >1, part of the original is added back to the highpass result, which 
restores partially the low-frequency components lost in the highpass filtering 
operation. The result is that the high-boost image looks more like the original 
image, with a relative degree of edge enhancement that depends on the value of 
A. Therefore, the center weight of the high-boost filter can be represented by 

W 5 = 9A-1 with A >1. (14) 

When A = 1, the basic highpass filter is obtained as in Figure 13 
[Gonzalez and Woods, 1993], 
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Figure 14: High-boost filtering with A = 1.8. The bottom image is the 
processed image. The brightness of the image is lowered and the 
features of the ships sharpened. 
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D. HISTOGRAM 

An image histogram is a plot of the distribution of intensities or gray levels 
in an image. The histogram of a digital image with gray levels in the range [0, L- 
1] can be represented by the discrete function 

P( r k) = — . ( 15 ) 

n 

where r\ is the k th gray level, is the number of pixels in the image with 
that gray level, n is the total number of pixels in the mage, and k = 0, 1,2, ... L-1. 




Figure 15: Histograms of four basic image types [After Gonzalez 
and Woods, 1993], 
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The image histogram gives an estimate of the probability of occurrence of 
a gray level />. A plot of this function for all values of k also provides a global 
description of the appearance of an image. For example, Figure 15 shows the 
histograms of four basic types of images. The histogram in Figure 15(a) shows 
that the gray levels are concentrated toward the dark end of the gray scale 
range. Thus, this histogram corresponds to an image with overall dark 
characteristics. Figure 15(b) is the opposite, with a bright image. The histogram 
shown in Figure 15(c) has a narrow shape, which indicates little dynamic range 
and thus corresponds to an image having low contrast, while Figure 15(d) shows 
a histogram with significant spread, corresponding to an image with high 
contrast. 

Although the histogram does not provide any specific information about 
the image content, the shape and distribution of the histogram provide a venue 
for contrast enhancement. However, the histogram is a global representation of 
the intensity characteristics within an image and therefore, histogram 
transformation affects the whole image, i.e. globally. This differs from the 
localized methods such as the spatial mask and filters, which depend only on the 
pixel looked at and its neighbors. 

E. HISTOGRAM EQUALIZATION 

The histogram of an image represents the relative frequency of 
occurrence of gray levels within an image. It also represents the probability of 
such an occurrence. With a narrow distribution of gray levels (refer to Figure 
15(c)), the contrast in the image will be low and the dynamic range limited. 
Hence, a good gray level assignment scheme would be to expand the intensity 
range to fill the whole dynamic range available. The probability of occurrence of 
all gray levels should be equal or uniform. In histogram equalization, the goal is 
to obtain a uniform histogram distribution for the output image, so that an optimal 
overall contrast is perceived. 
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An outline of the histogram equalization process is as follows [Gonzalez 
and Woods, 1993]: 

Let the variable r represent the gray levels in the image to be enhanced or 
equalized. The pixel values can be normalized to form continuous quantities in 
the interval [0, 1], with r = 0 representing black and r= 1 representing white. 

For any r in the interval [0, 1], the transformation is of the form: 

s=T(r), (16) 

which produces a gray level s for every level of r in the original image. It is 
assumed that the transformation function given in equation (15) satisfies the 
conditions: (a) T(r) is single-valued and monotonically increasing in the interval 
0 < r< 1; and (b) 0 < T(r) <1 for 0 < r < 1. Condition (a) preserves the order from 
black to white in the gray scale, whereas condition (b) guarantees a mapping that 
is consistent with the allowed range of gray levels. 

The inverse transformation from s back to r is then denoted 

r = T 1 (s), 0 < s < 1, (17) 

where the assumption is that T ^s) also satisfies conditions (a) and (b) 
with respect to the variable s. 

The gray levels in an image may be viewed as random quantities in the 
interval [0, 1], If they are continuous variables, both the original and transformed 
gray levels can be characterized by their probability density function p r (r) and 
p s (s) respectively, where the subscripts on p are used to indicate that p r and p s 
are different functions. 

The probability density function of the transformed gray levels can 
therefore be expressed by: 

(18) 

r=T-'(s) 


Ps(S) 


/ \ dr 
Pr ( r ) —j 
ds 
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Consider the transformation function 


s = T(r) = j p r (w)dw, 0<r<'\, (19) 

0 

where w is a dummy variable of integration. Equation (19) is actually the 
cumulative distribution function (CDF) of r. Conditions (a) and (b) presented 
earlier are satisfied by this transformation function, because the CDF increases 
monotonically from 0 to 1 as a function of r. 

From equation (19), the derivative of s with respect to r is 

ds 

jr = Pr(r)■ (2°) 

dr 

Substituting equation (20) into equation (18) yields 

P.(s)= P,(r>-!— =1, 0 <s< 1, (21) 

L P,(OL r-V.) 

which gives a uniform density in the interval of the transformed variable s. 
This result is independent of the inverse transformation function. Thus, using the 
cumulative distribution function of r as the transformation function produces an 
image with uniform density gray levels and with better contrast distribution. 

For discrete formulation, the probabilities are replaced by: 

P(r k ) = — 0 < < 1 and /c = 0, 1 ... L-1, (22) 

n 

and equation (19) will be given by the relation 

s* = T(r k ) = £%- = i,P,(r,)- ( 23 ) 

7=0 n j =0 

A MATLAB implementation for the histogram equalization is available in 
Appendix A. 
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Figure 16: Result of histogram equalization. The bottom image is 
the processed output. 
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Figure 17: Image histograms before and after equalization. 

Figure 16 and 17 show the histogram equalization results and 
corresponding histograms. The improvement over the original image is quite 
evident, as the treeline and foliages are now much more clearly defined. Looking 
at the histogram plots, the gray levels of the equalized image are spread out, 
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resulting in an increase in the dynamic range of gray levels and hence overall, 
contrast of the image. 

Histogram equalization significantly improves the visual appearance of the 
image. Similar enhancement results could have been achieved by using a 
contrast stretching approach, but the main advantage of histogram equalization 
over manual contrast stretching or manipulation techniques is that the former is 
fully automatic, without the need to select any setting or to adapt to the original 
histogram distribution of the image. 

F. ADAPTIVE HISTOGRAM EQUALIZATION 

In low contrast images, the features of interest may occupy only a 
relatively narrow range of gray scale, with the majority of gray levels occupied by 
“uninteresting areas” such as background and noise. These “uninteresting areas” 
may also generate large counts of pixels and hence, large peaks in the 
histogram. In this case, the global histogram equalization amplifies the image 
noise and increases visual graininess or patchiness. The global histogram 
equalization technique does not adapt to local contrast requirements, and minor 
contrast differences can be entirely missed when the number of pixels falling in a 
particular gray range is small. 

Adaptive Histogram Equalization (AHE) is a modified histogram 
equalization procedure that optimizes contrast enhancement based on local 
image data. The basic idea behind the scheme is to divide the image into a grid 
of rectangular contextual regions, and to apply a standard histogram equalization 
in each. The optimal number of contextual regions and the size of the regions 
depend on the type of input image, and the most commonly used region size is 
8x8 (pixels). In addition, a bi-linear interpolation scheme is used to avoid 
discontinuity issues at the region boundaries. 

Figure 18 illustrates the application of the interpolation scheme at the 
boundaries. Gray level assignment at the sample positions indicated by the white 
dot are derived from gray-value distributions in the surrounding contextual 
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regions. The points A, B, C, and D are the centers of the surrounding contextual 
regions; region-specific gray level mappings (gA(s), gefsj, g c (s) and g D (sJ) are 
based on the histogram equalization of the pixels contained. Thus, assuming that 
the original pixel intensity at the sample point is s, its new gray value s’ is 
calculated by bilinear interpolation of the gray-level mappings that were 
calculated for each of the surrounding contextual regions: 

s’=(1-y)(( 1-x)g A (s) + xg B (s))+y(( 1-x)g c (s) + xg D (s)), (24) 

where x and y are normalized distances with respect to the point A. This 
gray level interpolation is repeated over the entire image [Zuiderveld, 1994], 
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Figure 18: Bilinear interpolation to eliminate region boundaries 
[From Zuiderveld, 1994], 


AHE is able to overcome the limitations of the standard equalization 
method as discussed earlier, and achieves a better presentation of information 
present in the image. However, AHE is unable to distinguish between noise and 
features in the local contextual regions. Hence, background noise is amplified in 
“flat” or “featureless” regions of the image, which is a major drawback of the 
method. 
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G. CONTRAST LIMITED ADAPTIVE HISTOGRAM EQUALIZATION 

The noise problem associated with AHE can be reduced by limiting 
contrast enhancement specifically in homogeneous areas. These areas can be 
characterized by a high peak in the histogram associated with the contextual 
regions since many pixels fall inside the same gray level range. The Contrast 
Limited Adaptive Histogram Equalization (CLAHE) limits the slope associated 
with the gray level assignment scheme to prevent saturation, as illustrated in 
Figure 19. This process is accomplished by allowing only a maximum number of 
pixels in each of the bins associated with the local histograms. After “clipping” the 
histogram, the clipped pixels are equally redistributed over the whole histogram 
to keep the total histogram count identical. The CLAHE process is summarized in 
Table 1. 



Figure 19: Principle of contrast limiting as used in CLAHE. (a) 
Histogram of a contextual region containing many background pixels, 
(b) Calculated cumulative histogram, (c) Clipped histogram with excess 
pixels redistributed throughout the histogram, (d) Cumulative clipped 
histogram with maximum slope set to the clip limit [From Zuiderveld, 
1994], 


The clip limit is defined as a multiple of the average histogram contents 
and is actually a contrast factor. Setting a very high clip limit basically limits the 
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clipping and the process becomes a standard AHE technique. A clip or contrast 
factor of one prohibits any contrast enhancement, preserving the original image. 

Table 1. Summary of CLAHE process [Mathworks, 2003], 

1. Obtain all the inputs: 

• Image 

• Number of regions in row and column directions 

• Number of bins for the histograms used in building image 
transform function (dynamic range) 

• Clip limit for contrast limiting (normalized from 0 to 1) 

2. Pre-process the inputs: 

• Determine real clip limit from the normalized value. 

• If necessary, pad the image (to even size) before splitting 
into regions. 

3. Process each contextual region (tile) thus producing gray level 
mappings: 

• Extract a single image region. 

• Make a histogram for this region using the specified number 
of bins. 

• Clip the histogram using clip limit. 

• Create a mapping (transformation function) for this region. 

4. Interpolate gray level mappings in order to assemble final 
CLAHE image: 

• Extract cluster of four neighboring mapping functions. 

• Process image region partly overlapping each of the 
mapping tiles. 

• Extract a single pixel, apply four mappings to that pixel, and 
interpolate between the results to obtain the output pixel. 

• Repeat over entire image. 


The CLAHE process and command can be found in the Image Processing 
Toolbox (version 4.1) of MATLAB (version 6.5, release 13). 
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The main advantages of the CLAHE transform are its modest 
computational requirements, ease of use and excellent results on most images. 
Figure 20 compares the CLAHE result to that obtained by the standard histogram 
equalization method. The CLAHE image has less amplified noise and avoids the 
brightness saturation in the standard histogram equalization. Additional 
comparison samples are included in Appendix B. 

CLAHE does have its limitations. Since the method is aimed at optimizing 
contrast, there no direct 1 -to-1 relationship between the gray values of the 
original image and the CLAHE processed result. Pixels of the same gray level in 
the original image may be mapped to different gray levels in the output image, 
because of the equalization process and bilinear interpolation. Consequently, 
CLAHE images are not suited for quantitative measurements that rely on 
physical meaning of image intensity [Zuiderveld, 1994], 


36 





Figure 20: Comparison of images obtained from standard histogram 
equalization (top image) and from CLAHE (bottom image). The CLAHE 
image has less amplified noise and avoids saturation by the bright 
source in the image. Figure 8 contains the original image. 
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III. IMAGE ENHANCEMENT BY CLAHE 


A. SPATIAL FREQUENCY 

An image can be expressed in both the spatial and the frequency 
domains. The spatial domain is simply the two-dimensional image space which 
contains an array of pixels with intensity values representing the image. The 
image can be converted from the spatial domain to the frequency domain by 
Fourier transform. 

The periodicity with which the image intensity values change is commonly 
referred to as the spatial frequency. The image value at each position (f x , f y ) in 
the frequency domain represents the amount by which the intensity values in the 
image vary over a specific distance related to the spatial frequencies f x and f y (for 
a 2-dimensional image). For a simple image that is totally grey in color, i.e. one 
single gray value in all pixels, there will be no frequency component in both the x- 
and y-directions, although there will still be a zero frequency component 
corresponding to the single gray value of the image, or in other words, the DC 
component of the image. If there is a change in intensity or gray level values, 
there will be some frequency components along the direction of change in the 
frequency domain. There will be only one frequency component if the change is 
purely sinusoidal. 

For example, suppose that there is the value 20 at the point that 
represents the frequency 0.1 (or 1 period every 10 pixels). This means that in the 
corresponding spatial domain, the intensity values vary from dark to light and 
back to dark over a distance of 10 pixels, and that the contrast between the 
lightest and darkest is 40 gray levels (2 times 20). 

The significance and correlation of the spatial frequency to the image is 
illustrated in Figure 21. A simple square-in-square image is generated with 
different degrees of contrast against the background as shown. For the first 
image, the background is set at a gray level of 100 and the square at 128, while 
for the second image; the background is set at 0 (black) and the square at the 
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same level of 220. The corresponding spatial frequency spectra are plotted and 
the increased higher frequency components due to the increased contrast 
between object and background are clearly shown. 



Figure 21: A simple image with its corresponding spatial frequency 
spectrum and the same image with a higher contrast between object 
and background, showing increased higher frequency components. 


Hence, a high spatial frequency therefore represents a large change in 
intensity or contrast over short image distances. This can be translated to edges 
and sharp details in the image. The larger the amplitude or the frequency power, 
the greater the contrast change. The zero frequency in the frequency domain will 
correspond to the baseline intensity level in the image [HIPR, 2003], 
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To reinforce this point, the standard test image “Lenna” is used to illustrate 
the visual effect of boosting the higher spatial frequencies. The original gray¬ 
scale “Lenna” image (512x512) is converted to the frequency domain and 
components beyond the 150 th pixel (arbitrary chosen) away from the zero 
frequency are enhanced 250% in magnitude. The resulting image is shown on 
the right of Figure 22, which has sharper details (e.g. the lines of the hat). Hence, 
increasing the power of the higher frequency components enhances the edges 
and sharpens the details in the image, very much similar to a high-boost filter. 
The bottom pair of images in Figure 22 illustrates the effect of increasing the zero 
frequency component by 20% (the brightness of the image is increased). 





Figure 22: Effect of adjusting spatial frequency powers on the 
image. The top pair of images illustrates an increase in the power of 
the higher frequency components, while the bottom pair represents an 
increased power in the zero frequency component. 
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B. IMAGE QUALITY ASSESSMENT 


The aim of image processing is naturally concerned with producing better 
images. But the key question is how do we quantify or measure the term “better” 
in image quality assessment. There is no absolute measuring scale like the 
kilogram in weight or the meter in distance. The fact remains that the image is 
ultimately perceived by a pair of human eyes and interpreted by the human brain 
for whatever purpose the image is intended for. Hence, the assessment of image 
quality is always subjective. There have been attempts to introduce an objective 
assessment methodology of image quality, such as mean-square error, 
probability of detection and peak signal-to-noise ratio [Barret, 1990], But the 
basic difficulty is that images can be used for a variety of functions or purposes 
(e.g. classification, detection and measurement). A “good” image for one purpose 
may not be suitable for another. Furthermore, the performance of the human 
visual system (including the human brain) is not consistent even for the same 
image, let alone among individuals. Experience, eye-sight, training, age, physical 
conditions and fatigue will all affect the final interpretation of the image. 

An image is always produced for a specific purpose or task, and the only 

meaningful measure of its quality is how well it fulfills that purpose. An objective 

approach to assessment of image quality must therefore start with a specification 

of the task and then determine quantitatively how well the task is performed or 

achieved [Barrett, 1990], For example, in assessing the image quality for image 

compression, the mean-square error is a relevant and objective measure of the 

amount of distortion in the compressed image, as the smaller the error, the better 

the image. In the case of night vision images, their main purpose will be for 

detection of objects and providing information about the surrounding, when the 

human eyes are not sensitive enough under the low-illumination conditions. A 

quantitative measure for such a purpose would be the probability of detection or 

the time to detection. However, all the II and Tl images used in this thesis are 

samples provided by the Naval Research Laboratory, as suitable imagers were 

not available at the time of the study. Some of the images contain identifiable 

objects, such as ships and fence, while others are just general outdoor scenes of 
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foliage. There is unfortunately no “hidden” object implanted in the scenes to 
measure quantitatively the quality of the image with respect to its purpose for 
detection. 

Another objective measure of night vision images could be the number of 
edges or the intensity of the edges in the image. With more enhanced edges, 
more details and more information can be perceived from the image. As 
discussed in the previous section, edges in the image correspond to high spatial 
frequencies. Hence, for the same image, if there is more power in the higher 
spatial frequencies, the edges will be enhanced and hence, more details will be 
detectable. This is similar in principle to the highpass filter in the frequency 
domain as described in Chapter II. In this respect, the quality of the image can 
therefore be judged to be better, as the enhanced edges would improve the 
information content of the image, and the increased power in the high spatial 
frequencies can be measured objectively. 

C. ANALYZSIS OF ENHANCEMENT RESULTS 

A CLAHE-processed night vision image is compared to its original 
unprocessed version in Figure 23. The CLAHE processed image appears to 
have “better clarity” as image edges and details have been enhanced by the 
CLAHE process. The profile of the foliage and the river bank are “easier” to 
identify. The single small tree in the center of the image is a good example of 
enhancement produced by CLAHE. Therefore, this edge enhancement would 
theoretically be accompanied by increased higher spatial frequency components 
in the frequency domain of the image. Our aim is to compare the frequency 
spectra of the original and the processed image for increased higher frequencies 
and to use this difference as an objective basis forjudging improvement in image 
quality, instead of relying solely on subjective visual assessment. 
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Figure 23: Unprocessed (top) and CLAHE processed night vision 
images (bottom) for comparing the improvement in image contrast and 
details enhancement by CLAHE. 
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1. Spatial Frequency Spectrum 

The image is first converted from the spatial domain to the frequency 
domain by using the 2-dimensional discrete Fast Fourier Transform (FFT) in 
MATLAB. The image is “padded” (to 1024x1024) during the FFT process, i.e. 
adding zeros to the beginning and/or end of the time-domain sequence. This 
addition increases the frequency resolution of the FFT and does not affect the 
frequency spectrum of the image. As the image sizes are 480x640, padding the 
image to even dimensions of power 2 (2 10 = 1024) also reduces the FFT 
computation time. The Fourier transform is also shifted to center the zero 
frequency with respect to the image center. The frequency power spectrum is 
then plotted out using the “mesh” command in MATLAB. 

Figure 24 plots the frequency responses of the unprocessed and the 
corresponding CLAHE-processed image shown in Figure 23. Clearly, there is an 
increased amount of higher frequency components, as shown by the higher 
spikes and color-coded profiles contained in the pictures, i.e. there is more power 
in the higher spatial frequencies. This observation supports the fact that the 
edges have been enhanced. Notice that the zero frequency is centered at the 
location (512,512) as a result of the padding to 1024x1024. 

2. Spectrum Power Distribution 

Next, the cumulative power distribution with respect to the distance from 
the center zero frequency (in terms of number of pixel count) is plotted to further 
examine the frequency power distribution. This computation is accomplished by 
superimposing a square window over the frequency spectrum and summing the 
power contained within it. The center of the square will overlie the zero frequency 
center and the distance will be equivalent to half the length of the square window. 
A contour plot of the frequency spectrum was created with MATLAB to illustrate 
the expanding window for computing the total amount of power, as shown in 
Figure 25. The contour plots also provide a different viewing aspect for 
comparing the frequency spectra of the processed and unprocessed images. 
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Figure 24: Frequency spectrum plot of the unprocessed image and 
the CLAHE processed image, showing an increase in the power of 
higher frequency components. The maximum peak value is clipped at 
5x10 5 to focus on the power distribution beyond the zero frequency. 
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Figure 25: Contour plots of the unprocessed image and the CLAHE 
processed image. The summation process to compute the power 
distribution is as illustrated on the top image. 
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x iq 9 Spectrum power distribution 



Figure 26: Cumulative spectrum power distribution plots for six pairs 
of images, showing an overall increased in total spectrum power and a 
higher percentage of power in the higher frequencies. 

The cumulative spectrum power distributions of the original and processed 
images are plotted in Figure 25. A total of six image pairs were used to give an 
indicative trend of the distribution profile. Figure 25 shows that the total spectrum 
power has been increased by the CLAHE process, which can be translated here 
to increased brightness and contrast in the image. The rate of increase in the 
cumulative power in the second half of the curves, i.e. the higher frequencies, is 
also steeper for the CLAHE-processed images (the green dotted lines) than that 
of the original image, as illustrated by the gradient triangles in red. This 
difference implies that there is a higher percentage of power contained in the 
higher frequencies and indicates edge enhancement in the processed images. 
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Figure 27: Spectrum power distribution plot. The percentage of 
power contained in the higher frequencies is higher for the CLAHE- 
processed image as shown by the green profile. 


Figure 27 shows that a higher percentage of power is contained in the 
higher frequencies (from the 100 th pixel onwards for this image) in the CLAHE 
processed image than in the original unprocessed image. We also note that the 
percentage of power in the lower frequencies is lower for the processed image, 
which is not significant as the vital information, i.e. the edge content, is contained 
in the higher boosted frequencies. 

In summary, the results presented in Figure 26 and 27 validate the 
observation that the CLAHE process has enhanced the image edges and details, 
as evident from the boosted higher spatial frequency components. The CLAHE- 
enhanced images are therefore judged to be improved. 
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3. Histogram 

The histogram of the CLAHE-processed image is compared with its 
unprocessed version in Figure 27. The CLAHE processed image has a more 
evenly-distributed and wider spread of the gray levels, which translates to an 
image with better contrast as seen in the processed image in Figure 22. Since 
the amplitude of spatial frequency is dependent on the degree of contrast 
change, a larger contrast range in the histogram is therefore linked to increased 
spatial frequency components. 


Histogram Plot of Original Image 




Gray Level 


Figure 28: Comparison of the histograms of the unprocessed and 
the CLAHE processed image. The images are from Figure 22. 
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D. SUBJECTIVE ASSESSMENT 

The eventual user of an image is still the human being. Theoretical figures 
of merit and engineering computations may be inadequate in predicting the 
human response. Hence, any image quality assessment should still be validated 
by human subjects for acceptance. 

1. Test Outline 

A subjective test was conducted to evaluate the image enhancement by 
CLAHE. Fifteen students from the Naval Postgraduate school, aged 28 to 38 
years old, were approached for the test. Fifteen is the recommended minimum 
number of test subjects by the International Telecommunication Union [ITU-R 
BT.500-11, 2002], All subjects were voluntary and signed informed consents. 
Five of the subjects have no prior experience with night vision images or night 
vision devices, while the rest have experience with either the night vision goggle 
or the Thermal Imager. 

20 image pairs consisting of one CLAHE-processed and one unprocessed 
image of the same scene, were presented to the subjects on a Toshiba TECRA 
9100® laptop with 32-bit color and 1024x768 resolution setting. Brightness 
setting of the laptop LCD was at 50% and the test was conducted in a dimly- 
lighted room. Subjects were shown two consecutive sequences of the same 
image pair and asked to indicate their preference as to which one of the two 
images conveyed the most information or details about the scene. “Most 
information” can be interpreted as what allows the subject to see more objects (if 
any) or provides a better situation awareness about the scene. A choice of 
“neutral” can be entered when the subject finds that both images are comparable 
or there is no significant difference between the two. The display timing of the 
image sequence was set as: three seconds (image 1), one second (blank 
screen), three seconds (image 2), followed by a two seconds pause before the 
same sequence was repeated for a second time. Each test lasted approximately 
15 minutes. 
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The order of the processed and unprocessed image in the display 
sequence was randomized. Of the 20 image pairs, 5 were thermal images while 
the rest were NVD or II images. The thermal image pairs were interspersed 
among the II image pairs randomly. Due to the inherent high contrast present in 
these thermal images, it is expected that the enhancement by the CLAHE 
process would be insignificant and may even degrade the image quality. 
Therefore, the thermal images were inserted to break any monotony of choice 
that may arise in the experiment. 

2. Results 

Survey results are summarized in Table 2. 75% of the subjects found the 
CLAHE-processed night vision images to be more informative and a more 
meaningful representation of the scene, as compared to the original associated 
unprocessed images. This finding supports the proposition that the content of the 
image has been enhanced by the CLAHE process. 

The majority of the subjects did not find the CLAHE-processed thermal 
images to be better in providing information. About only 35% of the subjects 
found the processed thermal images to be more effective in providing 
information. This result could be due to the fact that the thermal images provided 
by NRL already have very good original contrast and as a result, the contrast 
enhancement by CLAHE is not significant. In some cases, the subjects 
commented that the image was “over-contrasted”, making the image “unnatural” 
and details difficult to identify. An example is shown in Figure 28. The image pair 
in Figure 28 is actually image pair number 10 in the subjective test, which 
received the lowest score. 

The CLAHE process enhancement is effective on the low-contrast night 
vision images as validated in the subjective testing. Thermal images generally 
have better contrast due to suppression of the background by AC coupling during 
the filtering process. But there would still be cases of low contrast thermal 
images, such as during dusk and dawn when the background temperature draws 

near the object temperature due to difference in thermal conductivity of object 
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and background. Therefore, the CLAHE process is still applicable to thermal 
imagery. 


Table 2. Subjective Test Results 
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Figure 29: Unprocessed and processed thermal image pair, 
illustrating the minimal improvement by the CLAHE process. 

3. Observations and Comments 
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a. No Objectivity in Images 

Most of the images obtained from the Naval Research Laboratory 
are outdoor scenes with no particular object for detection. The general feedback 
from the subjects is that it is difficult to judge the information content of the image 
without a specific object to look out for, i.e. some specific detail that could be 
seen in only the enhanced image and not the original image. Images which have 
such characteristics would aid in making the test more objective. A few of the 
subjects entered a “neutral” choice, basically because they could see the same 
amount of details in both original and enhanced images as both sets of images 
contained the same information, even though the processed images appeared 
clearer. This explains the relative lower score for image pairs 6, 7, 8 and 14 (the 
images are available in Appendix B). 

Hence, for future subjective testing, image pairs should be created 
(when the actual hardware is available) with one or more objects for detection. 
The objects could be obscured by low-light or camouflage to reduce their 
contrast and visibility in the original night vision images. These objects would be 
easier to see and detect after the CLAHE enhancement. A good example is 
the image pair from Figure 8 (original) and Figure 20 (CLAHE-processed). More 
ships can actually be seen with the enhancement, as agreed by 86.7% of the 
subjects. 

b. Scanning versus Staring 

Some of the subjects found the display time for the images to be 
too short for a proper assessment, which relates directly to the issue of scanning 
or staring assessment. Scanning is more concerned with wide-area surveillance 
where the assessment time is short and the images are displayed real-time; for 
staring, the image display is static. The commonality linking the two is the time to 
detection. Subjects would be likely to take less time to detect an object when the 
image has better contrast. Hence, the time to detection could be another 
objective measure of the image quality. However this measure can only be 
explored when there is object implanted in the image, as discussed in the 
previous section. 
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c. Experience Factor 

Five of the fifteen subjects did not have any prior experience of 
viewing night vision images or devices. Separating the two groups of subjects, 
the percentage for the CLAHE-processed image went up to 78% for those 
subjects with night vision experience as shown in Table 3, and the percentage is 
only 69% for subjects without any prior experience as per Table 4. The subjects 
from the group “without experience” indicated that they found enhanced noise 
and “graininess” in the CLAHE-processed image to be distracting, and preferred 
the original unprocessed image. The noise in question is actually inherited from 
the original image and hardware, something that experienced subjects have 
already accepted as a general characteristic of night vision images. Therefore, 
experience turns out to be a factor in the test results and should not be 
overlooked, as this group represents the new-users of night vision devices. It is 
also noted that there were more “neutral” choices from the experienced subjects, 
which could be explained by the lack of objectivity in the test images as 
discussed earlier. 

We recommend that the number of subjects be increased and 
include an equal number of experienced and inexperienced viewers for future 
studies. This would allow a more accurate analysis of the acceptance of the 
CLAHE enhancement and the influence of experience. The larger subject base 
would also better represent the population of users of night vision and thermal 
devices. 

d. Original Image Quality 

Image pair 4 received a relatively lower score for an II image. 
Examining the image pair reveals that the original image has reasonably good 
contrast due to a light source in the sky. Hence, the enhancement by CLAHE 
was not significant, which is similar to the thermal image pairs where the most 
common response was a preference for the original image. Therefore, the 
CLAHE enhancement may not be always necessary. 
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Table 3. Subjective Test Results (with night vision experience) 



Average preference for CLAHE-processed II image _ 78.0 

Average preference for CLAHE-processed Tl image 30.0 


57 











Table 4. Subjective Test Results (without prior experience) 



Image Preference (% of subjects) 

Image Pair 

Type 

Processed 

Unprocessed 

Neutral 

1 

II 

60.0 


20.0 

20.0 

2 

II 

40.0 


60.0 

- 

3 

II 

100.0 


- 

- 

4 

II 

60.0 


40.0 

- 

5 

Tl 


40.0 

60.0 

- 

6 

II 

60.0 


40.0 

- 

7 

II 

60.0 


40.0 

- 

8 

II 

80.0 


20.0 

- 

9 

II 

80.0 


20.0 

- 

10 

Tl 


20.0 

60.0 

20.0 

11 

II 

60.0 


40.0 

- 

12 

Tl 


60.0 


- 

13 

II 

80.0 


20.0 

- 

14 

II 

60.0 


40.0 

- 

15 

Tl 


40.0 

40.0 

20.0 

16 

II 

60.0 


40.0 

- 

17 

II 

80.0 


20.0 

- 

18 

Tl 


60.0 

40.0 

- 

19 

II 

80.0 


20.0 

- 

! 20 

II 

80.0 


20.0 

- 


Average preference for CLAHE-processed II image 69.3 


Average preference for CLAHE-processed Tl image 44.0 
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IV. CONCLUSIONS AND RECOMMENDATIONS 


A. SUMMARY 

The CLAHE algorithm is a digital contrast enhancement technique that 
emphasizes local details in the image while limiting noise amplification. This 
process is achieved with local histogram equalization and clipping, followed by 
bilinear interpolation. 

CLAHE contrast enhancement has been found to be visually significant, 
and object detection is improved with the higher contrast in the images. 
Examining the frequency response of the enhanced image reveals increases in 
the higher spatial frequencies. As higher spatial frequencies correspond to edges 
in the image, the increase in power represents an enhancement of the edges and 
hence, an increase in visible image details. We also conducted a subjective 
testing where the majority of the human subjects indicated that the CLAHE- 
enhanced images were more informative than the original images. 

Results indicated that the CLAHE process is effective in enhancing low- 
contrast images. However, the improvement is limited for images with initially 
good contrast, such as the thermal images in this study. Nevertheless, Tl can still 
suffer from low-contrast during the day, especially during dusk and dawn. 
Therefore, the CLAHE enhancement scheme is still applicable to both night 
vision devices (Image Intensifiers) and Thermal Imagers. This enhancement 
would be attractive for Image Intensifiers since they are cheaper and more 
compact, and their main handicap is their low-contrast imagery. 

The CLAHE process can be implemented in the form of a computer 
algorithm or a hardware electronic chip in the interface between the sensor and 
display. No modification is required on the sensor itself. The enhancement can 
also be real-time, as the CLAHE processing is not demanding. There is still a 
need for an on/off switch or option for the process as not all subjects found the 
enhancement beneficial at all times. 
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B. RECOMMENDATION FOR FURTHER RESEARCH 


1. Subjective Test with Object Detection 

A new set of matching night vision and thermal images containing specific 
objects should be created. The objects should be on the threshold of visibility in 
the unprocessed image and they should become detectable after the CLAHE 
enhancement. These image pairs can then be used in a larger or more extensive 
subjective test to determine the time to detection for these objects. Such test 
would help quantify the CLAHE improvement more objectively, and potentially 
justify its implementation cost. 

2. Image Fusion 

CLAHE-enhanced night vision images can be fused with their thermal 
counterparts (with or without enhancement) to assess any further improvement in 
image quality using the same frequency evaluation and subjective testing. One 
potential fusion algorithm to consider could be the nonlinear method proposed by 
Scrofani et. al. earlier (1997). 
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APPENDIX A: MATLAB ALGORITHMS 


This Appendix contains the following MATLAB source files: 

1. Histogram equalization (Test8_hist_equal.m). 

2. Frequency spectrum plot (Test13_power.m). 
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% Test8_hist_equal.m 
% Histogram equalization 
% =========================== 

% The input to the file has to be made manually in the m-file and run. 
% The output will consist of four histogram plots, the original image 
% and the processed image. 

% =========================== 


Aii = imread('21-l.tif); % Input test image 21-l.tif 


Aorg = Aii; 

graylvl = 256; % note the need to specify gray levels, typically it is 256 
Ivl = graylvl - 1; 

disp('Generating histogram.'); 

% ===== Generate histogram count ===== 
for k = 1:graylvl 

n_count(k) = length(find(Aii == k-1)); 
end 


r = [0:1 :lvl]; 
r_norm = r./lvl; 
total = sum(n_count); 
pdf = n_count./total; 


% graylevels from 0 to 255 
% normalized 
% total pixels count 
% generate probability distribution 


s_cdf = pdf; % generate cumulative density function 

for a = 1 :length(r)-1 

s_cdf(a+1) = s_cdf(a+1)+ s_cdf(a); 
end 


sjnt = s_cdf.*lvl; % rescale back to graylevel values 

sjvl = uint8(s_int+1.5); % convert to integer by removing decimals 

s_new = zeros(size(n_count)); % +1 to account for zero graylevel at 1st 
column 


disp('Equalising.'); 

% ===== Combine count for same gray levels after transformation 
for count = 1:1 :lvl+1 

s_new(s_lvl(count)) = s_new(s_lvl(count))+ n_count(count); 
end 

s_new = s_new./total; % normalized new values 

% ===== Remap graylevels in image ===== 
for m = 1:480 
for n = 1:640 

Aii(m,n) = s_lvl(double(Aii(m,n))+1); 
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end 

end 

disp('Transforming image.'); 

% ===== Counter-check graylevel transformation for equalization ===== 
for k = 1 :graylvl 

n_check(k) = length(find(Aii == k-1)); 
end 

disp('....done.'); 

% ===== Plot histograms ===== 

Figure(l) 

subplot(2,2,1 ),bar(r,n_count),title('Original histogram'),axis tight; 
subplot(2,2,2),bar(r_norm,s_cdf),title('Cdf),axis tight; 
subplot(2,2,3),bar(r_norm,s_new),title('Equalized histogram'),axis tight; 
subplot(2,2,4),bar(r,n_check), title('Equalized histogram 2'),axis tight; 

% the 3 rd histogram is normalized and serve as a counter-check for the 4 th 
histogram 

Figure(2) 

imshow(uint8(Aorg), 256); 
title('Original image') 

Figure(3) 

imshow(uint8(Aii),256); 
title('Resultant image') 

% ===== end ===== 
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% Test13_power.m 
% Plot the spectrum power distribution 
% =========================== 

% The input to the file has to be made manually in the m-file and run. 

% Input A is the original image while input B is the CLAHE processed image. 
% The first figure output will be the cumulative spectrum power plot. 

% The second figure output is the spectrum power distribution. 

% =========================== 


clear; 

Aii = imread('25-l.tif); 

Bii = imread('25-lah.tif); 

% input original image 
% input CLAHE image 

Afft = fft2(Aii, 1024,1024); 

Afft2 = fftshift(Afft); 

A2 = abs(Afft2); 

% fast fourier transform with padding 
% center zero frequency 
% take magnitude of complex 

Bfft = fft2(Bii, 1024,1024); 

Bfft2 = fftshift(Bfft); 

B2 = abs(Bfft2); 

% fast fourier transform with padding 
% center zero frequency 
% take magnitude of complex 

% find center of spectrum 
[nl x] = max(max(A2,[],1)); 

[ml y] = max(max(A2,[],2)); 


A_total = sum(sum(A2)); 

[m n] = size(A2); 
dim_max = m - y; 

A_array(1) = A2(x,y); 
A_arrayc(1) = A2(x,y); 

% find max dimensions of image 

% expanding square and sum 
for dim = 1 :dim_max 
A_arrayc(dim+1) = 0; 
for a = x-dim:x+dim 
for b = y-dim:y+dim 



A_arrayc(dim+1) = A_arrayc(dim+1 )+A2(b,a); 
A_array(dim+1) = A_arrayc(dim+1)- A_arrayc(dim); 
end 
end 
end 

% find center of spectrum for CLAHE image 
[nib xb] = max(max(B2,[],1)); 

[mlb yb] = max(max(B2,[],2)); 
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B_total = sum(sum(B2)); 

[mb nb] = size(B2); 
dim_maxb = mb - yb; 

B_array(1) = B2(xb,yb); 

B_arrayc(1) = B2(xb,yb); 

for dimb = 1 :dim_maxb 
B_arrayc(dimb+1) = 0; 
for a = xb-dimb:xb+dimb 
for b = yb-dimb:yb+dimb 

B_arrayc(dimb+1) = B_arrayc(dimb+1 )+B2(b,a); 
B_array(dimb+1) = B_arrayc(dimb+1)- B_arrayc(dimb); 
end 
end 
end 

% === Plot cumulative spectrum power distribution === 
figure; 

plot(0:511 ,A_arrayc./A_total,0:51 1 ,B_arrayc./B_total) 


% === Plot power distribution === 
figure; 

plot(0:511 ,A_array./A_total,0:51 1 ,B_array./B_total) 

% May have to zoom in the y aixs for a better view of the distribution 

% ==== end ===== 
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APPENDIX B: CLAHE ENHANCED IMAGES 


The following images are results obtained from using the Contrast Limited 
Adaptive Histogram Equalization (CLAHE) enhancement algorithm. The images 
on the left column are the original unprocessed night vision images, while the 
images on the right are the CLAHE processed images. These image pairs are 
used in the subjective testing to assess the improvement by the CLAHE method. 
The numbering of the image pair is the same as that used in the subjective test. 
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